Overview

Dataset statistics

Number of variables10
Number of observations5705
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory445.8 KiB
Average record size in memory80.0 B

Variable types

Numeric10

Alerts

gross_revenue is highly correlated with qtde_invoices and 2 other fieldsHigh correlation
recency_days is highly correlated with df_index and 1 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 4 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
frequency is highly correlated with qtde_invoicesHigh correlation
qtde_returns is highly correlated with qtde_items and 1 other fieldsHigh correlation
avg_ticket is highly correlated with qtde_items and 1 other fieldsHigh correlation
df_index is highly correlated with customer_id and 1 other fieldsHigh correlation
customer_id is highly correlated with df_index and 1 other fieldsHigh correlation
gross_revenue is highly skewed (γ1 = 22.60448601) Skewed
qtde_items is highly skewed (γ1 = 24.08929936) Skewed
avg_ticket is highly skewed (γ1 = 71.36561628) Skewed
qtde_returns is highly skewed (γ1 = 71.12000871) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
customer_id has unique values Unique
qtde_returns has 4200 (73.6%) zeros Zeros

Reproduction

Analysis started2022-11-07 18:20:26.607317
Analysis finished2022-11-07 18:20:55.560883
Duration28.95 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5705
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2900.82752
Minimum0
Maximum5795
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:55.720446image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile290.2
Q11457
median2903
Q34348
95-th percentile5503.8
Maximum5795
Range5795
Interquartile range (IQR)2891

Descriptive statistics

Standard deviation1671.81192
Coefficient of variation (CV)0.5763224145
Kurtosis-1.196226531
Mean2900.82752
Median Absolute Deviation (MAD)1446
Skewness-0.003583510807
Sum16549221
Variance2794955.097
MonotonicityStrictly increasing
2022-11-07T19:20:56.171215image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
38521
 
< 0.1%
38721
 
< 0.1%
38711
 
< 0.1%
38701
 
< 0.1%
38691
 
< 0.1%
38681
 
< 0.1%
38671
 
< 0.1%
38661
 
< 0.1%
38651
 
< 0.1%
Other values (5695)5695
99.8%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57951
< 0.1%
57941
< 0.1%
57931
< 0.1%
57921
< 0.1%
57911
< 0.1%
57901
< 0.1%
57891
< 0.1%
57881
< 0.1%
57871
< 0.1%
57861
< 0.1%

customer_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5705
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16602.92235
Minimum12346
Maximum22709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:56.382640image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12697.6
Q114288
median16229
Q318213
95-th percentile21746.2
Maximum22709
Range10363
Interquartile range (IQR)3925

Descriptive statistics

Standard deviation2811.170356
Coefficient of variation (CV)0.1693178043
Kurtosis-0.8232857661
Mean16602.92235
Median Absolute Deviation (MAD)1964
Skewness0.4409745831
Sum94719672
Variance7902678.772
MonotonicityNot monotonic
2022-11-07T19:20:56.588079image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
210711
 
< 0.1%
171231
 
< 0.1%
178911
 
< 0.1%
164981
 
< 0.1%
137451
 
< 0.1%
155841
 
< 0.1%
210891
 
< 0.1%
210881
 
< 0.1%
210871
 
< 0.1%
Other values (5695)5695
99.8%
ValueCountFrequency (%)
123461
< 0.1%
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
ValueCountFrequency (%)
227091
< 0.1%
227081
< 0.1%
227071
< 0.1%
227061
< 0.1%
227051
< 0.1%
227041
< 0.1%
227001
< 0.1%
226991
< 0.1%
226961
< 0.1%
226951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5459
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1772.229166
Minimum0.42
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:56.799501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.32
Q1236.3
median612.78
Q31568.23
95-th percentile5305.816
Maximum279138.02
Range279137.6
Interquartile range (IQR)1331.93

Descriptive statistics

Standard deviation7575.73186
Coefficient of variation (CV)4.274690885
Kurtosis676.7694665
Mean1772.229166
Median Absolute Deviation (MAD)478.32
Skewness22.60448601
Sum10110567.39
Variance57391713.21
MonotonicityNot monotonic
2022-11-07T19:20:56.984994image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
2.958
 
0.1%
4.958
 
0.1%
1.258
 
0.1%
3.757
 
0.1%
1.657
 
0.1%
12.757
 
0.1%
4.256
 
0.1%
5.956
 
0.1%
7.56
 
0.1%
Other values (5449)5633
98.7%
ValueCountFrequency (%)
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.844
0.1%
0.853
 
0.1%
1.071
 
< 0.1%
1.258
0.1%
1.441
 
< 0.1%
1.657
0.1%
1.691
 
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
140450.721
< 0.1%
124564.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
77183.61
< 0.1%
72882.091
< 0.1%
66653.561
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION

Distinct304
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.9258545
Minimum0
Maximum373
Zeros37
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:57.195423image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q123
median71
Q3200
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)177

Descriptive statistics

Standard deviation111.5740338
Coefficient of variation (CV)0.9542289363
Kurtosis-0.6403097319
Mean116.9258545
Median Absolute Deviation (MAD)61
Skewness0.8143906836
Sum667062
Variance12448.76501
MonotonicityNot monotonic
2022-11-07T19:20:57.395874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
979
 
1.4%
1779
 
1.4%
778
 
1.4%
1567
 
1.2%
Other values (294)4829
84.6%
ValueCountFrequency (%)
037
 
0.6%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
778
1.4%
882
1.4%
979
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37223
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

qtde_invoices
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.468010517
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:57.614278image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.807826068
Coefficient of variation (CV)1.963035012
Kurtosis302.5604586
Mean3.468010517
Median Absolute Deviation (MAD)0
Skewness13.20280307
Sum19785
Variance46.34649578
MonotonicityNot monotonic
2022-11-07T19:20:57.809743image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12876
50.4%
2829
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
Other values (46)332
 
5.8%
ValueCountFrequency (%)
12876
50.4%
2829
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
< 0.1%
861
< 0.1%
721
< 0.1%
622
< 0.1%
601
< 0.1%
571
< 0.1%

qtde_items
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1840
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean963.3996494
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:58.029145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q1106
median317
Q3804
95-th percentile2926.4
Maximum196844
Range196843
Interquartile range (IQR)698

Descriptive statistics

Standard deviation4296.512152
Coefficient of variation (CV)4.459740207
Kurtosis864.8815057
Mean963.3996494
Median Absolute Deviation (MAD)253
Skewness24.08929936
Sum5496195
Variance18460016.68
MonotonicityNot monotonic
2022-11-07T19:20:58.229599image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1114
 
2.0%
273
 
1.3%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1225
 
0.4%
8822
 
0.4%
7221
 
0.4%
720
 
0.4%
Other values (1830)5266
92.3%
ValueCountFrequency (%)
1114
2.0%
273
1.3%
351
0.9%
449
0.9%
535
 
0.6%
629
 
0.5%
720
 
0.4%
818
 
0.3%
97
 
0.1%
1017
 
0.3%
ValueCountFrequency (%)
1968441
< 0.1%
802631
< 0.1%
773731
< 0.1%
742151
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%
578851
< 0.1%

qtde_products
Real number (ℝ≥0)

HIGH CORRELATION

Distinct529
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.53531989
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:58.447004image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q114
median41
Q3106
95-th percentile331.8
Maximum7838
Range7837
Interquartile range (IQR)92

Descriptive statistics

Standard deviation210.4062511
Coefficient of variation (CV)2.273793957
Kurtosis511.099444
Mean92.53531989
Median Absolute Deviation (MAD)33
Skewness17.76686848
Sum527914
Variance44270.79052
MonotonicityNot monotonic
2022-11-07T19:20:58.668441image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1256
 
4.5%
2149
 
2.6%
3108
 
1.9%
10101
 
1.8%
699
 
1.7%
992
 
1.6%
591
 
1.6%
487
 
1.5%
1184
 
1.5%
783
 
1.5%
Other values (519)4555
79.8%
ValueCountFrequency (%)
1256
4.5%
2149
2.6%
3108
1.9%
487
 
1.5%
591
 
1.6%
699
 
1.7%
783
 
1.5%
881
 
1.4%
992
 
1.6%
10101
 
1.8%
ValueCountFrequency (%)
78381
< 0.1%
56731
< 0.1%
50951
< 0.1%
45801
< 0.1%
26981
< 0.1%
23791
< 0.1%
20601
< 0.1%
18181
< 0.1%
16731
< 0.1%
16371
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5511
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.69523826
Minimum0.42
Maximum77183.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:58.892787image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile3.462222222
Q17.95
median15.85882353
Q321.97516949
95-th percentile76.24171429
Maximum77183.6
Range77183.18
Interquartile range (IQR)14.02516949

Descriptive statistics

Standard deviation1042.883788
Coefficient of variation (CV)23.33321913
Kurtosis5254.947144
Mean44.69523826
Median Absolute Deviation (MAD)7.494723111
Skewness71.36561628
Sum254986.3343
Variance1087606.596
MonotonicityNot monotonic
2022-11-07T19:20:59.084306image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.7511
 
0.2%
4.9510
 
0.2%
1.259
 
0.2%
2.959
 
0.2%
7.958
 
0.1%
1.657
 
0.1%
12.757
 
0.1%
8.257
 
0.1%
3.356
 
0.1%
4.156
 
0.1%
Other values (5501)5625
98.6%
ValueCountFrequency (%)
0.423
0.1%
0.5351
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.83714285711
 
< 0.1%
0.842
< 0.1%
0.853
0.1%
1.0022222221
 
< 0.1%
1.021
 
< 0.1%
1.038751
 
< 0.1%
ValueCountFrequency (%)
77183.61
< 0.1%
13305.51
< 0.1%
4453.431
< 0.1%
38611
< 0.1%
3202.921
< 0.1%
30961
< 0.1%
1687.21
< 0.1%
1377.0777781
< 0.1%
1001.21
< 0.1%
952.98751
< 0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1226
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5475900988
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:59.315633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01104972376
Q10.025
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.975

Descriptive statistics

Standard deviation0.5504405784
Coefficient of variation (CV)1.005205499
Kurtosis138.6978064
Mean0.5475900988
Median Absolute Deviation (MAD)0
Skewness4.846833994
Sum3124.001514
Variance0.3029848303
MonotonicityNot monotonic
2022-11-07T19:20:59.502126image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12884
50.6%
248
 
0.8%
0.062518
 
0.3%
0.0277777777817
 
0.3%
0.0238095238116
 
0.3%
0.0833333333315
 
0.3%
0.0909090909115
 
0.3%
0.0344827586215
 
0.3%
0.0294117647114
 
0.2%
0.0769230769213
 
0.2%
Other values (1216)2650
46.5%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
248
 
0.8%
1.1428571431
 
< 0.1%
12884
50.6%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

qtde_returns
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct214
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.2138475
Minimum0
Maximum74215
Zeros4200
Zeros (%)73.6%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:20:59.711552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile38
Maximum74215
Range74215
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1003.441796
Coefficient of variation (CV)32.14732808
Kurtosis5241.542158
Mean31.2138475
Median Absolute Deviation (MAD)0
Skewness71.12000871
Sum178075
Variance1006895.439
MonotonicityNot monotonic
2022-11-07T19:20:59.920980image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
678
 
1.4%
561
 
1.1%
1252
 
0.9%
744
 
0.8%
843
 
0.8%
Other values (204)713
 
12.5%
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
561
 
1.1%
678
 
1.4%
744
 
0.8%
843
 
0.8%
941
 
0.7%
ValueCountFrequency (%)
742151
< 0.1%
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%
17761
< 0.1%

Interactions

2022-11-07T19:20:52.407490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:32.761517image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.986444image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:37.435779image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:39.833233image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.002591image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:43.824919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:45.945131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:48.435333image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:50.476762image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:52.653818image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:32.997872image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:35.186939image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:37.698063image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:40.009752image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.196061image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:44.000439image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:46.279219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:48.617836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:50.655274image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:52.952005image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:33.220264image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:35.367404image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:37.925442image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:40.362789image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.373577image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:44.184935image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:46.466708image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:48.781389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:50.819825image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:53.200327image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:33.446647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:35.567857image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:38.182740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:40.546287image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.531147image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:44.388381image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:46.667160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:48.974862image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.004320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:53.445657image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:33.615188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:35.754348image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:38.412113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:40.734773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.719934image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:44.605786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:46.910495image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:49.158359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.198791image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:53.633145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.031054image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:36.057520image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:38.639492image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:40.912288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:42.887477image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:44.768346image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:47.261539image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:49.375766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.381330image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:53.889445image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.230507image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:36.339695image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:38.957623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:41.123712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:43.075962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:45.002706image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:47.458002image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:49.598159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.584736image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:54.163698image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.431959image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:36.599064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:39.231875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:41.395226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:43.259461image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:45.212134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:47.671419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:49.811576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.784191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:54.418004image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.633408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:36.882289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:39.442302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:41.582725image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:43.450982image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:45.439515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:47.884835image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:50.106771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:51.975670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:54.678293image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:34.810924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:37.184464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:39.640760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:41.768229image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:43.632445image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:45.665892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:48.147120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:50.289274image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:20:52.156177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-11-07T19:21:00.109465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-07T19:21:00.357788image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-07T19:21:00.642012image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-07T19:21:00.909282image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-07T19:21:01.173561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-07T19:20:55.057257image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-07T19:20:55.401320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketfrequencyqtde_returns
00178505391.21372.034.01733.0297.018.15222217.00000040.0
11130473232.5956.09.01390.0171.018.9040350.02830235.0
22125836705.382.015.05028.0232.028.9025000.04032350.0
3313748948.2595.05.0439.028.033.8660710.0179210.0
4415100876.00333.03.080.03.0292.0000000.07317122.0
55152914623.3025.014.02102.0102.045.3264710.04011529.0
66146885630.877.021.03621.0327.017.2197860.057221399.0
77178095411.9116.012.02057.061.088.7198360.03352041.0
881531160767.900.091.038194.02379.025.5434640.243316474.0
99160982005.6387.07.0613.067.029.9347760.0243900.0

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketfrequencyqtde_returns
56955786227004839.421.01.01074.062.078.0551611.00.0
5696578713298360.001.01.096.02.0180.0000001.00.0
5697578814569227.391.01.079.012.018.9491671.00.0
569857892270417.901.01.014.07.02.5571431.00.0
56995790227053.351.01.02.02.01.6750001.00.0
57005791227065699.001.01.01747.0634.08.9889591.00.0
57015792227076756.060.01.02010.0730.09.2548771.00.0
57025793227083217.200.01.0654.059.054.5288141.00.0
57035794227093950.720.01.0731.0217.018.2060831.00.0
5704579512713794.550.01.0505.037.021.4743241.00.0